THE SMALLEST SINGULAR VALUE OF A RANDOM 
RECTANGULAR MATRIX 



MARK RUDELSON AND ROMAN VERSHYNIN 

Abstract. We prove an optimal estimate of the smallest singular value 
of a random subgaussian matrix, valid for all dimensions. For an x n 
matrix A with independent and identically distributed subgaussian entries, 
the smallest singular value of A is at least of the order \/N — y^rT^^T with 
high probability. A sharp estimate on the probability is also obtained. 



1. Introduction 

1.1. Singular values of subgaussian matrices. Extreme singular values of 
random matrices has been of considerable interest in mathematical physics, 
geometric functional analysis, numerical analysis and other fields. Consider 
an iV X n real matrix A with N > n. The singular values Sk{A) of A are 
the eigenvalues of 1^41 = VA^A arranged in nonincreasing order. Of particular 
significance are the largest and the smallest singular values 

(1.1) si{A) = sup \\Ax\\2, Sn{A) = inf \\Ax\\2. 

x: \\x\\2 = l ^- IfI!2 = 1 

A natural matrix model is given by matrices whose entries are independent 
real random variables with certain moment assumptions. In this paper, we 
shall consider subgaussian random variables ^ - those whose tails are dominated 
by that of the standard normal random variable. Namely, a random variable 

is called subgaussian if there exists B > such that 

(1.2) P(|^| > t) < 2exp(-tV5^) forallt>0. 

The minimal B in this inequality is called the subgaussian moment of In- 
equality (11. 2p is often equivalently formulated as the moment condition 

(1.3) (E|er)^/^ < CB^ for all p>l, 

where C is an absolute constant. The class of subgaussian random variables 
includes many random variables that arise naturally in applications, such as 
normal, symmetric ±1 and general bounded random variables. 
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In this paper, we study N x n real random matrices A whose entries are 
independent and identically distributed mean zero subgaussian random vari- 
ables. The asymptotic behavior of the extreme singular values of A is well 
understood. If the entries have unit variance and the dimension n grows to 
infinity while the aspect ratio n/N converges to a constant A G (0, 1), then 



almost surely. This result was proved in [21] for Gaussian matrices, and in [2J 
for matrices with independent and identically distributed entries with finite 
fourth moment. In other words, we have asymptotically 

(1.4) si(A) ~ ViV + v^, s„(A) ~ ViV- v^. 

Considerable efforts were made recently to establish non-asymptotic esti- 
mates similar to (11.41) , which would hold for arbitrary fixed dimensions and 
n; see the survey [13] on the largest singular value, and the discussion below 
on the smallest singular value. 

Estimates in fixed dimensions are essential for many problems of geometric 
functional analysis and computer science. Most often needed are upper bounds 
on the largest singular value and lower bounds on the smallest singular value, 
which together yield that A acts as a nice isomorphic embedding of M" into 
M^. Such bounds are often satisfactory even if they are known to hold up to 
a constant factor independent of the dimension. 

The largest singular value is relatively easy to bound above, up to a constant 
factor. Indeed, a standard covering argument shows that si{A) is at most of 
the optimal order ^/N for all fixed dimensions, see Proposition 12.31 below. The 
smallest singular value is significantly harder to control. The efforts to prove 
optimal bounds on Sn{A) have a long history, which we shall now outline. 

1.2. Tall matrices. A result of [3J provides an optimal bound for tall matrices, 
those with aspect ratio X = n/N satisfies A < Aq for some sufficiently small 
constant Aq > 0. Recalling (11.41) . one should expect that tall matrices satisfy 



(1.5) Sn{A) > cV N with high probability. 
It was indeed proved in [3] that for tall ±1 matrices one has 

(1.6) P(s„(A) < cVn) < e-^^ 
where Aq > and c > are absolute constants. 

1.3. Almost square matrices. As we move toward square matrices, thus 
making the aspect ratio X = n/N arbitrarily close to 1, the problem of esti- 
mating the smallest singular value becomes harder. One still expects ( 11.51) to 
be true as long as A < 1 is any constant. Indeed, this was proved in [16] for 
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arbitrary aspect ratios A < 1 — c/ logn and for general random matrices with 
independent subgaussian entries. One has 



;i.7) P(s„(A) < CAV^) < e 



cN 



where cx> depends only on A and the maximal subgaussian moment of the 
entries. 

In subsequent work [1], the dependence of cx on the aspect ratio in (11. 7p 
was improved for random ±1 matrices; however the probability estimate there 
was weaker than in (11.71) . An estimate for subgaussian random matrices of all 
dimensions was obtained in [19j. For any e > CN~^^'^, it was shown that 



P(s„(A) < e{l -X){^/N- ^)) < (Ce)^-" + e 



cN 



However, because of the factor (1 — A), this estimate is suboptimal and does 
not correspond to the expected asymptotic behavior (11.41) . 

1.4. Square matrices. The extreme case for the problem of estimating the 
singular value is for the square matrices, where N = n. Asymptotic (II. 4p is 
useless for square matrices. However, for "almost" square matrices, those with 
constant defect N — n = 0(1), the quantity \fN — y/n is of order I/a/ZV, so 
asymptotics (ll.4p heuristically suggests that these matrices should satisfy 

(1.8) Sn'yA) > —= with high probability. 



This conjecture was proved recently in [20] for all square subgaussian matrices: 
(1.9) P(s„(A) < <Ce + e-^^. 



1.5. New result: bridging all classes of matrices. In this paper, we prove 
the conjectural bound for Sn{A) valid for all subgaussian matrices in all fixed 
dimensions A^, n. The bound is optimal for matrices with all aspect ratios we 
encountered above. 

Theorem 1.1. Let A be an N x n random matrix, N > n, whose elements 
are independent copies of a mean zero subgaussian random variable with unit 
variance. Then, for every e > 0, we have 

(1.10) P(sn(^) < £{Vn - V^^)) < (C£)^-"+^ + e-"^ 

where C,c> depend (polynomially) only on the subgaussian moment B. 

For tall matrices, Theorem 11.11 clearly amounts to the known estimates (11.50 . 
(I1.6p . For square matrices {N = n), the quantity \/N — \/N — 1 is of order 
1/y/N, so Theorem 11.11 amounts to the known estimates (II. Sp . (II. 9p . Finally, 
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for matrices that are arbitrarily close to square, Theorem 11.11 yields the new 
optimal estimate 

(1.11) Sn{A) > c{Vn — y/n) with high probability. 

This is a version of the asymptotics fll.41) . now valid for all fixed dimensions. 
This bound was explicitly conjectured e.g. in 

Theorem 11.11 seems to be new even for Gaussian matrices. Some early 
progress was made by Edelman [5j and Szarek [22] who in particular proved 
(11.91) for Gaussian matrices, see also the subsequent work by Edelman and 
Sutton \^. Gordon's inequality [10] can be used to prove that, for Gaussian 
matrices, KsniA) > \/N — a/ti, see Theorem 11.13 in [1|. One can further use 
the concentration of measure inequality on the Euclidean sphere to estimate 
the probability as 

P(s„(A) < ViV- v^-t) < e-*'/^ t>0. 

However, this bound is not optimal, and it becomes useless for matrices that 
are close to square, when N — n = o{y/n). 

The form of estimate (11.101) may be expected if one recalls the classical e-net 
argument, which underlies many proofs in geometric functional analysis. By 
(11.11) . we are looking for a lower bound on ||v4a;|| that would hold uniformly for 
all vectors x on the unit Euclidean sphere 5*""^. For every fixed x G S*"^^, the 
quantity ||Aa;||2 is the sum of N independent random variables (the squares of 
the coordinates of Ax). Therefore, the deviation inequalities make us to expect 
that ||^a;||2 is of the order -s/N with probability exponential in iV, i.e. l — e~'^^. 
We can run this argument separately for each vector x in a small net M of the 
sphere S*""^, and then take the union bound to make the estimate uniform over 
X G M . It is known how to choose a net M of cardinality exponential in the 
dimension n — 1 of the sphere, i.e. \N'\ < e^^'^~^\ Therefore, with probability 
I _ gC{n-i)g-cAr^ have a good lower bound on ||Aa;||2 ~ \/N for all vectors 
X in the net A/". Finally, one transfers this estimate from the net to the whole 
sphere S*""^ by approximation. 

The problem with this argument is that the constants C and c are not the 
same. Therefore, our estimate on the probability 1 — e'-^("~^)e~'^^ is positive 
only for tall matrices, when N > {C/c)n. To reach out to matrices of arbitrary 
dimensions, one needs to develop much more sensitive versions of the e-net 
arguments. Nevertheless, the end result stated in Theorem 11.11 exhibits the 
same two forces played against one another - the probability quantified by the 
dimension N and the complexity of the sphere S"'~^ quantified by its dimension 
n — 1. 
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1.6. Small ball probabilities, distance problems, and additive struc- 
ture. Our proof of Theorem 11.11 is a development of our method in [20] for 
square matrices. Dealing with rectangular matrices is in several ways consid- 
erably harder. Several new tools are developed in this paper, which may be of 
independent interest. 

One new key ingredient is a small ball probability bound for sums of inde- 
pendent random vectors in W^. We consider the sum 5* = cik^k where 
are i.i.d. random variables and are real coefficients. We then estimate the 
probability that such sum falls into a given small Euclidean ball in Mf^. Useful 
upper bounds on the small ball probability must depend on the additive struc- 
ture of the coefficients at- The less structure the coefficients carry, the more 
spread the distribution of S is, so the smaller is the small ball probability. Our 
treatment of small ball probabilities is a development of the Littlewood-Offord 
theory from [50], which is now done in arbitrary dimension d as opposed in 
d = 1 in ^20j. While this paper was being written, Friedland and Sodin [8j pro- 
posed two different ways to simplify and improve our argument in [20] ■ With 
their kind permission, we include in Section [3] a multi-dimensional version of 
an unpublished argument of Friedland and Sodin [9], which is considerably 
simpler than our original proof. 

We use small the ball probability estimates to prove an optimal bound for the 
distance problem: how close is a random vector from an independent random 
subspace? Consider a vector X in with independent identically distributed 
coordinates and a subspace H spanned hj N — m independent copies of X. 
In Section HI we show that the distance is at least of order ^/m with high 
probability, and we obtain the sharp estimate on this probability: 

(1.12) P(dist(X, H) < e^) < {Cer + e"'^. 

This bound is easy for a standard normal vector X in M.^ , since dist(X, H) is 
in this case the Euclidean norm of the standard normal vector in M"^. However, 
for discrete distributions, such as for X with ±1 random coordinates, estimate 
f ll.l2p is non-trivial. In [20], it was proved for m = 1; in this paper we extend 
the distance bound to all dimensions. 

To prove fll.l2p . we first use the small ball probability inequalities to compute 
the distance to an arbitrary subspace H. This estimate necessarily depends on 
the additive structure of the subspace H; the less structure, the better is our 
estimate, see Theorem 14. 2[ We then prove the intuitively plausible fact that 
random subspaces have no arithmetic structure, see Theorem 14. 31 This together 
leads to the desired distance estimate fll.121) . 

The distance bound is then used to prove our main result. Theorem II. 1[ 
Let X be some column of the random matrix A and H be the span of the 
other columns. The simple rank argument shows that the smallest singular 
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value Sn{A) = if and only if X & H for some column. A simple quantitative 
version of this argument is that a lower estimate on Sn{A) yields a lower bound 
on dist(X, H). 

In Section [6], we show how to reverse this argument for random matrices - 
deduce a lower bound on the smallest singular value Sn{A) from lower bound 
(I1.12P on the distance dist{X,H). Our reverse argument is harder than its 
version for square matrices from [20], where we had m = 1. First, instead 
of one column X we now have to consider all linear combinations of ~ 
m/2 columns; see Lemma I6.2[ To obtain a distance bound that would be 
uniformly good for all such linear combinations, one would normally use an e- 
net argument. However, the distance to the (A^ — m)-dimensional subspace H 
is not sufficiently stable for this argument to be useful for small m (for matrices 
close to square). We therefore develop a decoupling argument in Section [7| to 
bypass this difficulty. 

Once this is done, the proof is quickly completed in Section [HI 

Acknowledgement. We are grateful to Shuheng Zhou, Nicole Tomczak-Jae- 
germann, Radoslaw Adamczak, and the anonymous referee for pointing out 
several inaccuracies in our argument. The second named author is grateful for 
his wife Lilia for her love and patience during the years this paper was being 
written. 

2. Notation and preliminaries 

Throughout the paper, positive constants are denoted C, Ci, C2, c, ci, C2, . . . 
Unless otherwise stated, these are absolute constants. In some of our argu- 
ments they may depend (polynomially) on specified parameters, such as the 
subgaussian moment B. 

The canonical inner product on M" is denoted (■,■), and the Euclidean norm 
on M" is denoted || ■ II2. The Euclidean distance from a point a to a subset 
D in M" is denoted dist(a,Z)). The Euclidean ball of radius R centered at a 
point a is denoted B{a,R). The unit Euclidean sphere centered at the origin 
is denoted S^~^. If £" is a subspace of R"", its unit Euclidean sphere is denoted 
S{E) := S^'-^nE. 

The orthogonal projection in onto a subspace E is denoted Pg. For a 
subset of coordinates J C {1, . . . , n}, we sometimes write Pj for P^j where it 
causes no confusion. 

2.1. Nets. Consider a subset D of R", and let e > 0. Recall that an e-net of 
D is a subset A/" C D such that for every x G -D one has dist(x,7V) < e. 
The following Lemma is a variant of the well known volumetric estimate. 
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Proposition 2.1 (Nets). Let S he a subset of \ and lets > 0. Then there 
exists an e-net of S of cardinality at most 

2„(l + -) . 

The published variants of his lemma (e.g. [T7j, Lemma 2.6) have exponent 
n rather than n — 1. Since the latter exponent will be crucial for our purposes, 
we include the proof of this lemma for the reader's convenience. 

Proof. Without loss of generality we can assume that e < 2, otherwise any sin- 
gle point forms a desired net. Let J\f be an e-separated subset of S of maximal 
cardinahty. By maximality, Af is an e-net of 5*. Since Af is e-separated, the 
balls B{x,e/2) with centers x G A/" are disjoint. All these balls have the same 
volume, and they are contained in the spherical shell -8(0, l+£:/2)\i?(0, l—e/2). 
Therefore, comparing the volumes, we have 

|7V| ■ vol(5(0,£/2)) < vol(5(0, l + £/2) \ 5(0,1 -e/2)). 

Dividing both sides of this inequality by vol(i?(0, 1)), we obtain 

lA^I • (e/2)" < (1 + e/2)" - (1 - £/2)". 

Using the inequality (1 + x)" — (1 — x)" < 2nx(l -f- x)"^'^ valid for x G (0, 1), 
we conclude that \Af \ is bounded as desired. This completes the proof. □ 

The following well known argument allows one to compute the norm of a 
linear operator using nets. We have not found a published reference to this 
argument, so we include it for the reader's convenience. 

Proposition 2.2 (Computing norm on nets). Let M he a e-net of S^~^ and 
M. he a 6-net of S"^~^. Then for any linear operator A : M" — 

Pll < 7^ 4f. 7T sup \{Ax,y)\. 

[l - e){l - d) xeAf,yeM 

Proof. Every z G S""^^ has the form z = x + h, where x G A/" and \\h\\2 < e. 
Since = sup^g^n-i ||^-2||2, the triangle inequality yields 

||y4|| < sup ||v4x||2 + max ||74/i||2. 

x£M \M2<e 

The last term in the right hand side is bounded by £||v4||. Therefore we have 
shown that 

{l-e)\\A\\<snp\\Axh. 

Fix X G Af. Repeating the above argument for ||Ax||2 = sup2^g5m-i \ {Ax,y)\ 
yields the bound 

{l-6)\\Ax\\2 < sup \{Ax,y)\. 

y&M 

The two previous estimates complete the proof. □ 
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Using nets, one easily proves the well known basic bound 0{^/N) on the 
norm of a random subgaussian matrix: 

Proposition 2.3 (Norm). Let A be an N x n random matrix, N > n, whose 
elements are independent copies of a subgaussian random variable. Then 

F{\\A\\ > ty/N) < e-'"''^ for t > Cq, 

where Co, Cq > depend only on the subgaussian moment B. 

Proof. Let M he & (l/2)-net of S^'^ and M be a (l/2)-net of S""-^ . By 
Proposition 12. H we can choose these nets such that 

|7V| < 2A^ ■ 5^-^ < 6^, \M\<2n-^''~^ 

For every x e M and y e Ai, the random variable {Ax,y) is subgaussian (see 
Fact 2.1 in [IS]), thus 

F{\{Ax,y) \ > t\^) < Cie-"i*'^ for t > 0, 

where Ci, Ci > depend only on the subgaussian moment B. Using Lemma [2^ 
and taking the union bound, we obtain 

F{\\A\\ > ty/N) < 4\X\\M\ max^F{\{Ax,y)\ > tVd) < 

This completes the proof. □ 

2.2. Compressible and incompressible vectors. In our proof of Theo- 
rem [LT], we will make use of a partition of the unit sphere S"'~^ into two sets 
of compressible and incompressible vectors. These sets were first defined in 
j20j as follows. 

Definition 2.4 (Compressible and incompressible vectors). Let 6,p G (0,1). 
A vector x G is called sparse if |supp(x)| < 6n. A vector x G S"'~^ is called 
compressible if x is within Euclidean distance p from the set of all sparse 
vectors. A vector x G S"'~^ is called incompressible if it is not compressible. 
The sets of compressible and incompressible vectors will be denoted by Comp = 
Comp{6, p) and Incomp = Incomp{6, p) respectively. 

We now recall without proof two simple results. The first is Lemma 3.4 from 

Lemma 2.5 (Incompressible vectors are spread). Let x G Incomp{6, p) . Then 
there exists a set a = a{x) C {1, . . . ,?t,} of cardinality \a\ > \p^5n and such 
that 

(2.1) ^= < < ^= for alike a. 
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The other result is a variant of Lemma 3.3 from ^2Cy, which estabhshes the 
invertibihty on compressible vectors, and allows us to focus on incompressible 
vectors in our proof of Theorem ll.il While Lemma 3.3 was formulated in [20] 
for a square matrix, the same proof applies to N x n matrices, provided that 

> n/2. 

Lemma 2.6 (Invertibihty for compressible vectors). Let A be an N xn random 
matrix, N > n/2, whose elements are independent copies of a suhgaussian 
random variable. There exist 5, p, C3 > depending only on the subgaussian 
moment B such that 

P( inf \\Ax\\2 < CgViV) < e""^^. 

x£ Comp{5,p) 

□ 

3. Small ball probability and the arithmetic structure 

Starting from the works of Levy [T3], Kolmogorov [12] and Esseen [7], a 
number of results in probability theory was concerned with the question how 
spread the sums of independent random variables are. It is convenient to 
quantify the spread of a random variable in the following way. 

Definition 3.1. The Levy concentration function of a random vector S in 
is defined for e > as 

C{S,e) = sup F{\\S~v\\2 < e). 

An equivalent way of looking at the Levy concentration function is that it 
measures the small ball probabilities - the likelihood that the random vector 
S enters a small ball in the space. An exposition of the theory of small ball 
probabilities can be found in [15] . 

One can derive a simple but rather weak bound on Levy concentration func- 
tion from Paley-Zygmund inequality. 

Lemma 3.2. Let ^ be a random variable with mean zero, unit variance, and 
finite fourth moment. Then for every e G (0, 1) there exists p G (0, 1) which 
depends only on e and on the fourth moment, and such that 

^^,e)<p. 

Remark. In particular, this bound holds for subgaussian random variables, and 
with p that depends only on e and the subgaussian moment. 

Proof. We use Paley-Zygmund inequality, which states for a random variable 
Z that 

(3.1) P(|Z| > e) > ^— — ^, e>0. 
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see e.g. [I6], Lemma 3.5. 

Let f G M and consider the random variable Z = ^ — v. Then 

EZ^ = l + v^. 

By Holder inequality, we have 

B := E^^ > (E^2)2 ^ 

so, using Minkowski inequality, we obtain 

(EZ4)l/4 < + ^ < 51/4^1 + ^) < 51/421/2(1 + ^2)l/2_ 

Using this in (13.11) . we conclude that 

This completes the proof. □ 

We will need a much stronger bound on the concentration function for sums 
of independent random variables. Here we present a multi-dimensional version 
of the inverse Littlewood-Offord inequality from [20]. While this paper was in 
preparation, Friedland and Sodin [8] proposed two different ways to simplify 
and improve our argument in |20j. We shall therefore present here a multi- 
dimensional version of one of arguments of Friedland and Sodin [9], which is 
considerably simpler than our original proof. 

We consider the sum 

N 

S = ttk^k 

k=l 

where are independent and identically distributed random variables, and Ofc 
are some vectors in M™. The Littlewood-Offord theory describes the behavior 
of the Levy concentration function of S in terms of the additive structure of 
the vectors a^. 

In the scalar case, when m = 1, the additive structure of a sequence 
a = (oi, . . . , otv) of real numbers can be described in terms of the shortest 
arithmetic progression into which it (essentially) embeds. This length is con- 
veniently expressed as the essential least common denominator of a, defined as 
follows. We fix parameters a,7 G (0, 1), and define 

LCD«,^(a) := inf > : dist(^a,Z^) < min(7||^a||2, a)}. 

The requirement that the distance is smaller than 7||^a||2 forces to consider 
only non-trivial integer points as approximations of 6a - only those in a non- 
trivial cone around the direction of a. One typically uses this definition with 7 
a small constant, and for a = c\/N with a small constant c > 0. The inequality 
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dist(^a, Z^) < a then yields that most coordinates of Oa are within a small 
constant distance from integers. 

The definition of the essential least common denominator carries over natu- 
rally to higher dimensions and thus allows one to control the arithmetic struc- 
ture of a sequence a = (oi, . . . , oat) of vectors G W^. To this end, we define 
the product of such multi-vector a and a vector 9 G as 

e-a = {{e,ai),...,{9,aN)) G M^. 

A more traditional way of looking at ^ ■ a is to regard it as the product of the 
matrix a with rows au and the vector 9. 
Then we define, for a > and 7 G (0, 1), 

LCD«,^(a) := inf |||0||2 : G M"", dist(^ ■ a, Z^) < min(7||0 ■ a||2, «)}. 

The following theorem gives a bound on the small ball probability for a 
random sum S = J2k=i ^k^k in terms of the additive structure of the coefficient 
sequence a. The less structure in a, the bigger its least common denominator 
is, and the smaller is the small ball probability for S. 

Theorem 3.3 (Small ball probability). Consider a sequence a = (ai, . . . , a^) 
of vectors G M™, which satisfies 

N 

(3.2) '^{ak,x)^ >\\x\\l for every X e R"" . 

k=l 

Let ^i,...,^7v be independent and identically distributed, mean zero random 
variables, such that C{C,k, 1) < 1 — & for some b > 0. Consider the random sum 
S = J2k=i '^kik- Then, for every a > and 7 G (0, 1), and for 



e > 



LCD«,^(a)' 
we have 



Remark. The non-degeneracy condition (13.21) is meant to guarantee that the 
system of vectors (a^) is genuinely m-dimensional. It disallows these vectors 
to lie on or close to any lower-dimensional subspace of M"* . 

Halasz [TT] developed a powerful approach to bounding concentration func- 
tion; his approach influenced our arguments below. Halasz pjj operated under 
a similar non- degeneracy condition on the vectors a^: for every x ^ S , at 
least cN terms satisfy \{ak,x)\ > 1. After properly rescaling by the factor 
^yc/N, Halasz's condition is seen to be more restrictive than (13.21) . 
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3.1. Proof of the Small Ball Probability Theorem. To estimate the Levy 
concentration function we apply the Esseen Lemma, see e.g. [23], P- 290. 



Lemma 3.4. Let Y be a random vector in M™. Then 

sup P(||F-t;||2 < v^) < / \(l)YiO)\de 

where = Kexp(2TTi{d,Y)) is the characteristic function ofY. 

Applying Lemma 13.41 to the vector Y = S/e and using the independence of 
random variables ^i, . . . , ^at, we obtain 

N 

(3.3) C{S,eV^)<C"' / T\\<j){{e,ak)/e)\de, 

where = Eexp(27rit^) is the characteristic function of ^ := ^i. To estimate 
this characteristic function, we follow the conditioning argument of [18j, [20] . 
Let ^' be an independent copy of ^ and denote by ^ the symmetric random 
variable ~ Then 

= Eexp{2mtO = Ecos(27rtO- 

Using the inequality |x| < exp(— i(l — x^)), which is valid for all x G M, we 
obtain 



< exp - i (l - E cos(27rtO) ) • 



By conditioning on C,' we see that our assumption < 1 — b implies that 

P(|f| > 1) > Therefore 



1 - Ecos(27rtO > > 1) ■ - cos(27rtO I 1^1 > l' 

>b-^E(mm\2nt^-2nq\^\\^\>l) 

= 166-Efmin|t^-gp | |^| > iV 

Substituting of this into (13.31) and using Jensen's inequality, we get 
C{S, ey/m) 

<C"^/ exp f -86Ef y min|e(^,afc)/£-gp |^| > l) 

<C"^Ef / exp f -86min 11-^ 



< C"^sup / exp(-86/"(^)) d9, 
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where 

f(6) = min - 6 ■ a — p 

The next and major step is to bound the size of the recurrence set 

/(t):=|0GS(O,v^): f{e)<t}. 

Lemma 3.5 (Size of the recurrence set). We have 

Cte 



vol(/(t)) < 



t < a/2. 



Proof. Fix t < a/2. Consider two points 6', 6" G I{t). There exist p',p" G 
such that 



9' -a-p' 



< t. 



Let 



6" -a- p" 



p := p — p . 



< t. 



Then, by the triangle inequahty, 

(3.4) \\r ■ a - p\\2 < 2t. 

Recall that by the assumption of the theorem, 

LCD,,,(a) > 

Therefore, by the definition of the least common denominator, we have that 
either 

llrlU > 



m 



or otherwise 

(3.5) Ik ■ ~ Plh > min(7||r • a\\2, a). 

In the latter case, since 2t < a, inequalities 03.41) and (13. 5p together yield 

2t > 7||r • a\\2 > Tiklh, 

where the last inequality follows from condition (13.21) . 

Recalling the definition of r, we have proved that every pair of points 9', 6" G 
I it) satisfies: 



either \\e'-e"h> 



m 



--: R or 



ni, < ?H 



=: r. 



z 72; 

It follows that I{t) can be covered by Euclidean balls of radii r, whose centers 
are /^-separated in the Euclidean distance. Since I{t) C B{0, v^)) the number 
of such balls is at most 



vol(5(0, v^ + i?/2)) 
vol(E(0,i?/2)) 



2^/^ 



R 



+ 1 



< 



3^/m\"^ 



R 



14 



MARK RUDELSON AND ROMAN VERSHYNIN 



m 



V0l(/(t)) < (^)' 



(In the last inequality we used that R < ^Jm because 2; > 1). Recall that 
the volume of a Euclidean ball of radius r in is bounded by (Cr/i/m) 
Summing these volumes, we conclude that 

'3Cr\ 

\R 

which completes the proof of the lemma. □ 

Proof of Theorem VJ.'d We decompose the domain into two parts. First, by the 
definition of /(t), we have 

/ exp(-86/^(0)) de< I exp(-26a2) dd 

(3.6) <C"'exp{-2ba^). 

In the last line, we used the estimate |vol(i?(0, \/rn)\ < C". 

Second, by the integral distribution formula and using Lemma 13. 5[ we have 

'■0/2 

exp{-8bf\e)) de = I 

I{a/2) 



exp{-8bf\e)) dd= 166texp(-86t2)|vol(/(t))| dt 
Jo 

/ Cf \m 

< 166 ( — ^ ) / exp(-86t') dt 

(3 7) < (^YV^ < ( ^"^ ' 



Combining (13. 6p and (13.71) completes the proof of Theorem 13.31 □ 

3.2. Least common denominator of incompressible vectors. We now 

prove a simple fact that the least common denominator of any incompressible 
vector a in is at least of order -\/iV. Indeed, by Lemma 12.51 such a vector 
has many coordinates of order l/\fN. Therefore, to make a dilation 9a of this 
vector close to an integer point, one has to scale a by at least 9 > We 
now make this heuristic reasoning formal. 

Lemma 3.6 (LCD of incompressible vectors). For every 5, p € (0,1) there 
exist ci(5, p) > and C2(5) > such that the following holds. Let a G he 
an incompressible vector: a G Incomp{6, p) . Then, for every < 7 < ci(5, p) 
and every a > 0, one has 



LCBarfia) > C2i6)VN. 
Remark. The proof gives Ci{6,p) = \p^\fb and C2(5) = \'sf^- 
Proof. By Lemma [2.51 there exists a set cri C {1, . . . , A^} of size 



I ii - 
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and such that 
(3.8) 

Let ^ := LCD, 



_P_ 



for k E ai. 



ya,-y{(i)- Then there exists p G such that 
ll^^a — p||2< 7116*0112 = 76*. 
This shows in particular that 9 > 0; dividing by 9 gives 



P 

""9 



< 7- 



Then by Chebychev inequahty, there exists a set (T2 C {1, 

1 



A^} of size 



I0-2I > iV p-'SN 



and such that 
(3.9) 

Since |o"i| + 



Pk 

ak-j 



< 



V2 



7 



for G cr2- 



pV6 Vn 

(721 > A^, there exists A; G ai fl (T2. Fix this k. By the left hand 



side of 03.81) . by (13. 9p and the assumption on 7 we have: 

P 7 



Pk 
9 



> 



> 0. 



Thus \pk\ > 0; since pk is an integer, this yields \pk\ > 1. Similarly, using the 
right hand side of (13. 8p . (13.91) and the assumption on 7, we get 



Pk 
9 

Since \pk\ > 1, this yields 



< 



1 



_ ^ 

6N ' pVS 



+ 



7 



N 



< 



1^1 > 



This completes the proof. 

4. The distance problem and arithmetic structure 



□ 



Here we use the Small Ball Probability Theorem 13.31 to give an optimal 
bound for the distance problem: how close is a random vector X in M.^ from 
an independent random suhspace H of codimension ml 

If X has the standard normal distribution, then the distance does not de- 
pend on the distribution of H. Indeed, for an arbitrary fixed if, the distance 
dist(X, if) is distributed identically with the Euclidean norm of a standard 
normal random vector in W^. Therefore, 



dist(X, ii) 



with high probability. 
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More precisely, standard computations give for every e > that 
(4.1) F{dist{X,H) < e^) < (Ce)™. 

However, if X has a more general distribution with independent coordinates, 
the distance dist(X, H) may strongly depend on the subspace H. For example, 
if the coordinates of X are ±1 symmetric random variables, then for H = 
{x : Xi + X2 = 0} the distance equals with probability 1/2, while for 
H = {x : xi + ■ — \- xn = 0} the distance equals with probability ~ 1/ \/N . 

Nevertheless, a version of the distance bound (14. ip remains true for general 
distributions if if is a random subspace. For spaces of codimension m = 1, 
this result was proved in [20j. In this paper, we prove an optimal distance 
bound for general dimensions. 

Theorem 4.1 (Distance to a random subspace). Let X he a vector in 
whose coordinates are independent and identically distributed mean zero sub- 
gaussian random variables with unit variance. Let H be a random subspace in 
spanned by N — m vectors, < m < cN, whose coordinates are indepen- 
dent and identically distributed mean zero subgaussian random variables with 
unit variance, independent of X. Then, for every v G and every e > 0, we 
have 

P(dist(X, H + v)< eV^) < (Ce)™ + e""^, 
where C,c,c> depend only on the subgaussian moments. 

Remark. To explain the term e~^^, consider ±1 symmetric random variables. 
Then with probability at least 2~" the random vector X coincides with one of 
the random vectors that span H, which makes the distance equal zero. 

We will deduce Theorem 14.11 from a more general inequality that holds for 
arbitrary fixed subspace H. This bound will depend on the arithmetic structure 
of the subspace H, which we express using the least common denominator. 

For a > and 7 G (0,1), the essential least common denominator of a 
subspace E in is defined as 

LCD„,^(E) := inf{LCD«,^(a) : a E S{E)}. 

Clearly, 

LCD«,^(E) = inf jll^lls : eeE, dist(^,Z^) < min(7||^||2, a)}. 

Then Theorem 13.31 quickly leads to the following general distance bound: 

Theorem 4.2 (Distance to a general subspace). Let X be a vector in whose 
coordinates are independent and identically distributed mean zero subgaussian 
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random variables with unit variance. Let H be a subspace in of dimension 
N — m > 0. Then for every v G M^, a > 0, 7 G (0, 1), and for 



e > 



LCD„,^(/7^)' 

we have 

P(dist(X, H + v)< £^/^ < i^—j + C"^e 
where C,c > depend only on the subgaussian moment. 

Proof. Let us write X in coordinates, X = (^1, . . . , ^at). By Lemma [3^ and the 
remark below it, all coordinates of X satisfy the inequality £(^fc, 1/2) < 1 — 6 
for some 6 > that depends only on the subgaussian moment of ^fc- Hence the 
random variables satisfy the assumption in Theorem 13. 3[ 

Next, we connect the distance to a sum of independent random vectors: 

N 

(4.2) disi{X,H + v) = ||P^x(X-t;)||2= llj^afe^-w^ 

k=l 

where 

and where Ci,. . . ,6^ denotes the canonical basis of M^. Therefore, the se- 
quence of vectors a = (ai, . . . , a^v) is in the isotropic position: 

N 

^^(aA:,x)^ = ||x||2 for any x G H'^, 

k=l 

so we can use Theorem 13.31 in the space H-^ (identified with M"^ by a suitable 
isometry) . 

For every 6* = (6*1, ... , Ojy) G and every k we have {9, a^) = {Pjj±9, e^) = 
{9,ek) = 9k, so 

9-a = 9 

where the right hand side is considered as a vector in R^. Therefore the least 
common denominator of a subspace can be expressed by that of a sequence of 
vectors a = (ai, . . . , a^): 

LCDa,y{H^) = LCD„,^(a). 

The theorem now follows directly from Theorem 13.31 □ 

In order to deduce the Distance Theorem 14. it will now suffice to bound 
below the least common denominator of a random subspace H^. Heuristically, 
the randomness should remove any arithmetic structure from the subspace, 
thus making the least common denominator exponentially large. Our next 
results shows that this is indeed true. 
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Theorem 4.3 (Structure of a random subspace). Let H be a random subspace 
in spanned by N — m vectors, 1 < m < cN , whose coordinates are in- 
dependent and identically distributed mean zero subgaussian random variables 
with unit variance. Then, for a = c\fN , we have 

where c G (0, 1) and c G (0, 1/2) depend only on the subgaussian moment. 

Assuming that this result holds, we can complete the proof of the Distance 
Theorem I4.1[ 



Proof of Theorem 4J_' Consider the event 

By Theorem SSI P(^^) < e""^. 

Let us condition on a realization oi H in S. By the independence of X and 
H, Theorem 14.21 used with a = c\fN and 7 = c gives 

P(dist(X, H) < I £) < + C'^e-"'^ 



for every 



p ^ r r^p-cN/m 



Since m < cN, with an appropriate choice of c we get 

Therefore, for every e > 0, 

P(dist(X, H) < I £) < {Cier + 2e~^3^ < {Cier + e""*^. 
By the estimate on the probability of S'^, this completes the proof. □ 

4.1. Proof of the Structure Theorem 14.31 Note first, that throughout 
the proof we can assume that > Nq, where A^^o is a suitably large number, 
which may depend only the subgaussian moment. Indeed, the assumption on 
m implies that N > m/c > 1/c. Choosing c > suitably small depending on 
the subgaussian moment, we can make A"o suitably large. 

Let Xi, . . . , X^-m denote the independent random vectors that span the 
subspace H. Consider an (A^ — m)xN random matrix B with rows X^. Then 

C ker(5). 

Therefore, for every set S in we have: 

(4.3) inf ||5x||2>0 implies iJ^n5 = 0. 
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This observation will help us to "navigate" the random subspace H away 
from undesired sets 5* on the unit sphere. 

We start with a variant of Lemma 3.6 of [20]; here we use the concept of 
compressible and incompressible vectors in M-^ rather than M". 

Lemma 4.4 (Random subspaces are incompressible). There exist 5, p E (0, 1) 
such that 

¥{H^ n S^~^ C Incomp{5, p)) > 1 - e^"^. 

Proof. Let B be the (N—m) x matrix defined above. Since N—m > {l—c)N 
and c < 1/2, we can apply Lemma [2.61 for the matrix B. Thus, there exist 
5, p e (0, 1) such that 

P( inf 115x112 > csViV) > 1 - e-^^^^. 

Comp{5,p) 

By dlJl), n Comp{6, p) = with probability at least 1 - e'"^^ . □ 

Fix the values of 6 and p given by Lemma 14.41 for the rest of this section. 
We will further decompose the set of incompressible vectors into level sets 
according to the value of the least common denominator D. We shall prove 
a nontrivial lower bound on inixf^So II -^^11 2 > for each level set up to D of 
the exponential order. By (14. 3p . this will mean that is disjoint from every 
such level set. Therefore, all vectors in must have exponentially large least 
common denominators D. This is Theorem 14.31 

Let a = pVn, where p > is a small number to be chosen later, which 
depends only on the subgaussian moment. By Lemma [3. 6 [ 

LCDa,c{x) > CqVn for every x G Incomp. 
Definition 4.5 (Level sets). Let D > cqVN. Define Sd ^ S^-^ as 
5*^ ;= |a; e Incomp : D < LCDa,c{x) < 2D}. 

To obtain a lower bound for ||-Ba;||2 on the level set, we proceed by an e-net 
argument. To this end, we first need such a bound for a single vector x. 

Lemma 4.6 (Lower bound for a single vector). Let x G Sd- Then for every 
t > we have 

(4.4) F{\\Bx\\2<tVN) < (Ct+ — + 06-"" j 

Proof. Denoting the elements of B by C,jk, we can write the j-th coordinate of 
Bx as 

N 

iBx)j = QkXk =■■ Cj, J = 1, • • • , ^ - 
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Now we can use the Small Ball Probability Theorem 13.31 in dimension m = 1 
for each of these random sums. By Lemma 13.21 and the remark below it, 
C{^jk, 1/2) < 1 — 6 for some b > that depends only on the subgaussian 
moment of ^jk- Hence the random variables satisfy the assumption in 

Theorem 13. 3[ This gives for every j and every t > 0: 

Since Q are independent random variables, we can use Tensorization Lemma 2.2 
of [20] to conclude that for every t > 0, 

/ s , r^il s N-m 

P( E 101' < - ^)) < + ^ + C"e--') . 

This completes the proof, because ||-Ba;||2 = ^Y^^Z-T lOP and N < 2{N — m) 
by the assumption. □ 

Next, we construct a small e-net of the level set Sd. Since this set lies in 
S^^^, Lemma [271] yields the existence of an (V^ / D)-\iei of cardinality at most 
(CD/\/N)^. This simple volumetric bound is not sufficient for our purposes, 
and this is the crucial step where we explore the additive structure of Sd to 
construct a smaller net. 

Lemma 4.7 (Nets of level sets). There exists a {Aa/D)-net of Sd of cardinality 
at most (CqD/VN)^. 



Remark. Recall that a is chosen as a small proportion of y/N. Hence Lemma l4n 
gives a better bound than the standard volumetric bound in Lemma 12.11 

Proof. We can assume that 4a/ D < 1, otherwise the conclusion is trivial. For 
X E Sd, denote 

D{x) := LCD«,,(a;). 

By the definition of Sd, we have D < D{x) < 2D. By the definition of the 
least common denominator, there exists p G such that 

(4.5) \\D{x)x — p\\2 < a. 

Therefore 

p 



X 



D{x) 

Since 11 x 11 2 = 1, it follows that 



a a 1 
< < — < -. 

2 Dix) - D - 4 
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On the other hand, by (14.51) and using that ||x||2 = 1, D{x) < 2D and 
4a/D < 1, we obtain 

(4.7) \\p\\2< D{x) + a<2D + a<3D. 

Inequahties (14. 6 p and (14. 7p show that every point x E So is within Euchdean 
distance 2a /D from the set 

AT ■= IJL. : peZ^n B(0, 3D)}. 

A known volumetric argument gives a bound on the number of integer points 
in 5(0, 3D): 

\Ar\ < (1 + 9D/y/Nf < [CoD/^/N)'' 

(where in the last inequality we used that by Definition 14.51 of the level sets, 
D > cqVN). Finally, there exists a (4a;/D)-net of Sd with the same cardi- 
nality as A/", and which lies in So- Indeed, to obtain such a net, one selects 
one (arbitrary) point from the intersection of Sd with a ball of radius 2a/ D 
centered at each point from Af. This completes the proof. □ 

Lemma 4.8 (Lower bound for a level set). There exist ci,C2,/U G (0, 1) such 
that the following holds. Let a = fi^/N > 1 and D < ciVNc'^'^^^"^ . Then 



P( M ||5x||2 < C2N/D) < 2e 



-N 



Proof. By Lemma [273| there exists K > 1 that depends only on the subgaussian 
moment and such that 



B\\ > kVn) < e" 



■Af 



Therefore, in order to complete the proof, it is enough to find z/ > which 
depends only on the subgaussian moment, and such that the event 

S := I inf WBxh < — and ||5|| < kVn] 

has probability at most . 

We claim that this holds with the following choice of parameters: 

1 1/3 

= ——, Ci = Cfi < u, 



(3CCo)2e' ^ 9K'- 

where C > 1 and c G (0, 1) are the constants from Lemma [4.61 and Co > 1 is 
the constant from Lemma 14. 7[ 

By choosing c in the statement of Theorem |43] suitably small, we can assume 
that > (this is because by the assumptions, N > m/c > 1/c). We apply 
Lemma [4.61 with t = uy/N / D. Then recalling the choice of a and Ci and our 
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assumption on one easily checks that the term Ct dominates in the right 
hand side of (lOl) : 

t>l/D and t > e"""'. 
This gives for arbitrary xq ^ Sd'- 

Now we use Lemma 14.71 which yields a small (4a/D)-net A/" of So- Taking 
the union bound, we get 

Denote Ci := 3CCq. Using the fact that Ci < u and our assumption on D, we 
have: 

(4.8) p < (^)"'z/^-'" < (i/e^^/-)-z/^— < C^^u^ = e"^. 



Assume £ occurs. Fix x E Sd for which ||-Bx||2 < it can be approximated 



uN 

0£) lUi WlilUli ||JJX||2 ^ 

by some element € A/" as 

II II 

If -X0II2 < — — • 

Therefore, by the triangle inequality we have 

\\Bxoh < \\Bx\\2 + \\B\\ ■ \\x - xoh + KVN ■ -^-jy- < — , 

where in the last inequality we used our choice of /i. 
We have shown that the event S implies the event that 

uN 

inf \\Bxoh < 

whose probability is at most by (14.81) . The proof is complete. □ 



Proof of Theorem \4.3[ Consider x G ^ such that 

LCD„,,(a;) < ciViVe"^^/"^, 

where Ci is the constant from Lemma 14.81 Then, by the Definition 14.51 of the 
level sets, either x is compressible or x & Sd for some D & V, where 

V={D: coVn <D < ciViVe"^^/"^, D = 2^ ke N}. 

Therefore, recalling the definition of the least common denominator of the 
subspace 

LCD„c(^^^)= inf LCD^Jx), 
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we can decompose the desired probability as follows: 

< ¥{H^ n Comp ^ 0) + ^ F{H^ n So ^ 0). 

-cN 



By Lemma 14.41 the first term in the right hand side is bounded by e 
Further terms can be bonded using (14.31) and Lemma 14.81 



¥{H^ n 5d 7^ 0) < P( inf \\Bx\\2 = O) < 2e 

x&So 



N 



Since there are \'D\ < CN terms in the sum, we conclude that 

p < e-'^ + C'Ne-^ < e-^'^. 
This completes the proof. □ 

5. Decomposition of the sphere 

Now we begin the proof of Theorem 11.11 We will make several useful reduc- 
tions first. 

Without loss of generality, we can assume that the entries of A have a 
an absolutely continuous distribution. Indeed, we can add to each entry an 
independent Gaussian random variable with small variance a, and later let 

Similarly, we can assume that n > no, where no is a suitably large number 
that depends only on the subgaussian moment B. 
We let 

N = n-l + d 
for some d > 1. We can assume that 
(5.1) l<d< Con, 

with suitably small constant cq > that depends only on the subgaussian 
moment B. Indeed, as we remarked in the Introduction, for the values of d 
above a constant proportion of n. Theorem 11.11 follows from (II. 7p . Note that 

d 



n 
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Using the decomposition of the sphere S*""^ = Comp U Incomp, we break 
the invertibihty problem into two subproblems, for compressible and incom- 
pressible vectors: 

(5.2) p(s„(A) < £(ViV - y^T^)) < P(s„(A) < 5-^) 



<P( inf \\Ax\\2 < e-^) +¥{ inf \\Ax\\2 < ■ 



A bound for the compressible vectors follows from Lemma 12. 6[ Using (15. ip 
we get 

d I— I — 



Jn 

Hence, Lemma [2.61 implies 

(5.3) Pf inf \\Ax\\2 < e4=) < e^''^. 

\ x<^Comp{S,p) \/tL/ 



It remains to find a lower bound on \\Ax\\ for the incompressible vectors x. 

6. INVERTIBILITY via uniform DISTANCE BOUNDS 

In this section, we reduce the problem of bounding ||Ax||2 for incompressible 
vectors x to the distance problem that we addressed in Section HI 

Let Xi, . . . , Xn G denote the columns of the matrix A. Given a subset 
J C {1, . . . , n} of cardinahty d, we consider the subspace 

Hj := span(Xfc)fcej C R^. 

For levels Ki,K2 > that will only depend on 6, p, we define the set of 
totally spread vectors 

(6.1) Spreadj := \y G 5(M^) : ^ < \yk\ < ^ for all kej]. 

\' d V(i J 

In the following lemma, we let J be a random subset uniformly distributed 
over all subsets of {1, . . . ,n} of cardinality d. To avoid confusion, we often 
denote the probability and expectation over the random set J by Pj and Ej, 
and with respect to the random matrix A by P^ and E^. 

Lemma 6.1 (Total spread). For every 6, p E (0, 1), there exist Ki,K2,Co > 
which depend only on 6, p, and such that the following holds. For every x G 
Incomp{S, p) , the event 

£{x) := I G Spreadj and < ||p_^x||2 < ^ ^ 



\Pjx\\2 V2r2 \/5n- 

satisfies ¥j{£{x)) > Cq. 
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Remark. The proof gives Ki = 5/2, = 1/Ki, cq = p^5/2e. In the rest 
of the proof, we shall use definition (16.11) of Spreadj with these values of the 
levels Ki, K2. 

Proof. Let a C {l,...,n} be the subset from Lemma l2.5[ Recall that the 
parameters 6 and p depend only on the subgaussian moment B (see Lemma 
12.61) . By choosing the constant cq in (15. ip appropriately small, we may assume 
that d < \cr\/2. Then, using Stirling's approximation we have 



d J / \d J V 2e 

If J C cr, then summing (12.11) over k E J, we obtain the required two-sided 
bound for 11 Pjx 11 2. This and (12. Ih yields p-[\ G Spreadj. Hence £(x) holds. 

□ 

Lemma 6.2 (Invertibility via distance). Let S,p E (0, 1). There exist Ci,ci > 
which depend only on 6, p, and such that the following holds. Let J be any 
d-element subset of {1, . . . ,n}. Then for every e > 

(6.2) Pf inf \\Ax\\2< cieJ-) <C'} ■¥( inf dist {Az, Hjc) < e) . 

Remark. The proof gives Ki = p^Jj2, K2 = 1/Ki, ci = p/V2, Ci = 2e/p'^5. 

Proof. Let x G Incomp{6, p). For every subset J of {1, . . . , n} we have 

\\Ax\\2 > di?,t{Ax,Hjc) = di?,t{APjx,Hjc). 

In case the event £{x) of Lemma lOl holds, we use the vector z = . G 
Spreadj to check that 

\\Ax\\2>\\Pjx\\2D{A,J), 
where the random variable 

D{A,J)= inf dist (A^, if jc) 

is independent of x. Moreover, using the estimate on ||Pj2;||2 in the definition 
of the event £{x), we conclude that 

(6.3) £{x) implies \\Ax\\2>Ci\-DiA,J). 

V n 

Define the event 

J':={A: Wj{D{A,J)>e) > 1-c^}, 

where cq is the constant from Lemma 16. 1[ Chebychev inequality and Fubini 
theorem then yield 

Pa(.^^) < Co^E^Pj(/^(A, J)<e) = Co%Pa(I^(A, J) < e). 
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Since the entries of A are independent and identically distributed, the proba- 
bility Fa{D{A, J) < e) does not depend on J. Therefore, the right hand side 
of the previous inequality coincides with the right hand side of (16.21) . 

Fix any realization of A for which occurs, and fix any x G Incomp{6, p). 
Then 

Fj{D{A, J)>e)+ Fj{S{x)) > (1 - 4) + 4 = 1, 
so we conclude that 

(6.4) ^j{£{x) and D{A, J) > e) > 0. 

We have proved that for every x G Incomp{5, p) there exists a subset J = J(x) 
that satisfies both £{x) and D{A, J) > e. Using this J in (16.31) . we conclude 
that every matrix A for which the event JF occurs satisfies 

inf ||^3;||2 > eci 

x^Incomp{5,p) 

This and the estimate of Pa(^^) completes the proof. □ 




7. The uniform distance bound 

In this section, we shall estimate the distance between a random ellipsoid 
and a random independent subspace. This is the distance that we need to 
bound in the right hand side of (16.21) . 

Throughout this section, we let J be a fixed subset of {!,... ,n}, \J\ = d. 
We shall use the notation introduced in the beginning of Section [61 Thus, Hj 
denotes a random subspace, and Spreadj denotes the totally spread set whose 
levels Ki, K2 depend only on 5, p in the definition of incompressibility. 

We will denote by K, Kq, C, c, Ci, Ci, . . . positive numbers that depend only 
on S, p and the subgaussian moment B. 

Theorem 7.1 (Uniform distance bound). For every t > 0, 

Pf inf dist(Az, Hjc) < tVd) < {CtY + e""^. 

V 2 e Spread J / 

Recall that Hjc is the span of n — d independent random vectors. Since 
their distribution is absolutely continuous (see the beginning of Section [5]), 
these vectors are almost surely in general position, so 

(7.1) dim{Hjc) = n - d. 

Without loss of generality, in the proof of Theorem 17.11 we can assume that 

(7.2) t > to = e-'''/'' 
with a suitably small c > 0. 
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7.1. First approach: nets and union bound. We would like to prove 
Theorem 17. II by a typical e-net argument. Theorem 14 . 1 1 will give a useful prob- 
ability bound for an individual z G S^~^. We might then take a union bound 
over all z in an e-net of Spreadj and complete by approximation. However, 
the standard approximation argument will leave us with a larger error e~'^'^ on 
the probability, which is unsatisfactory for small d. To improve upon this step, 
we shall improve upon this approach using decoupling in Section I7.2[ 
For now, we start with a bound for an individual z G S""~^. 

Lemma 7.2. Let z G 5*"^^ and v G M^. Then for every t that satisfies {\7.2\i 
we have 

F(dist{Az,Hjc+v) < t^/d^ < {Citf'^-\ 

Proof. Denote the entries of matrix A by Then the entries of the random 
vector Az, 

n 

Ci := {Az)i = ^ijZj, j = 1, . . . , iV, 
i=i 

are independent and identically distributed mean zero random variables. More- 
over, since the random variables ^ij are subgaussian and J2j=i -^J = 1; the 
random variables Q are also subgaussian (see Fact 2.1 in [T5]). 

Therefore the random vector X = Az and the random subspace H = Hjc 
satisfy the assumptions of Theorem 14.11 with m = N — {n — d) = 2d — 1 (we 
used (17. ip here). An application of Theorem 14.11 completes the proof. □ 

We will use this bound for every z in an e-net of Spreadj. To extend the 
bound to the whole set Spreadj by approximation, we need a certain stability 
of the distance. This is easy to quantify and prove using the following repre- 
sentation of the distance in matrix form. Let P be the orthogonal projection 
in onto (Hj.)-^, and let 

(7.3) W:=PA\^j. 
Then for every v G M^, the following identity holds: 

(7.4) dist{Az, Hjc -\- v) = \\Wz — w\\2, where w = Pv. 

Since \J\ = d and almost surely dim(ifjc)-'- = N — {n — d) = 2d — 1, the 
random matrix W acts as an operator from a (i-dimensional subspace into a 
(2d — l)-dimensional subspace. Although the entries of W are not necessarily 
independent, we expect W to behave as if this was the case. To this end, 
we condition on the realization of the subspace (Hjc). Now the operator P 
becomes a fixed projection, and the columns of W become independent random 
vectors. Then W satisfies a version of Proposition 12.31 
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Proposition 7.3. Let P be an orthogonal projection in M.^ of rank d and let 
W = PA\^j be a random matrix. Then 

r{\\W\\ > tVd) < e-""*''^ for t > Cq. 

Proof. The argument is similar to that of Proposition 12.31 Let A/" be a (1/2)- 
net of S(R-^) and M be a (l/2)-net of S{PR^). Note that for xeJ\f, ye M, 
we have {Wx,y) = {Ax,y). The proof is completed as in Proposition 12.31 □ 

Using Propositio ri7.3l we can choose a constant Kq that depends only on 
the subgaussian moment, and such that 

(7.5) ^{\\W\\ > KoVd) < e-'^. 

With this bound on the norm of W , we can run the approximation argument 
and prove the distance bound in Lemma [7.21 uniformly over all z e Spreadj. 



Lemma 7.4. Let W be a random matrix as in Proposition 7.3. Then for every 
t that satisfies (17.21) we have 

(7.6) Pf inf \\Wz\\2<tVd and\\W\\< K^^/d] <{C2t)'^. 

Proof. Let e = t/KQ. By Proposition 12. 11 there exists an e-net M of Spreadj C 
S[W) of cardinahty 

lA^I <2d(l + ^) <2rf(^°^ 

Consider the event 



S :=\ inf \\Wz\\2 < 2tVd\. 



Taking the union bound and using the representation (17.41) in Lemma 17.21 we 
obtain 

P(^) < lA^I ■ m^axPdll^^lla < 2tv^) < 2rf(^j {2Citf''-^ < (Cst)''. 

Now, suppose the event in (17.61) holds, i.e. there exists z' G Spreadj such that 

\\Wz\\2 < tVd and < KoVd. 
Choose z E Af such that ||2; — z'||2 < e. Then by the triangle inequality 

\\Wz\\2 < \\Wz'\\2 + IIW^II • 11^ - z'\\2 <tVd + KoVd ■ e < 2tVd. 
Therefore, 8 holds. The bound on the probability of £ completes the proof. □ 
Lemma 17.41 together with (17.51) yield that 



Pf inf \\Wz\\2 < tVd) < {C2tY + e"'^. 

V z 6 Spread , / 
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By representation f l7.4p . this is a weaker version of Theorem 17.21 with e"*^ in- 
stead of e~'^^ . Unfortunately, this bound is too weak for small d. In particular, 
for square matrices we have c? = 1, and the bound is useless. 

In the next section, we will refine our current approach using decoupling. 

7.2. Refinement: decoupling. Our problem is that the probability bound 
in (17.51) is too weak. We will bypass this by decomposing our event according 
to all possible values of and by decoupling the information about ||W^-2||2 
from the information about \\W\\. 

Proposition 7.5 (Decoupling). Let W he an N ^ d matrix whose columns are 
independent random vectors. Let /5 > and let z G 5''^^^ he a vector satisfying 
\zk\ > for a// A; G {1, . . . , c/}. Then for every < a < b, we have 



F{\\Wz\\2<a, \\W\\>b) <2 sup f(\\Wx-w\\2 < ^a) f(\\W\\ > ^) . 



Proof. If d = 1 then ||W^|| = ||W^2;||2, so the probability in the left hand side 
is zero. So, let d > 2. Then we can decompose the index set {1, . . . ,n} into 
two disjoint subsets / and H whose cardinalities differ by at most 1, say with 
\I\=\d/2]. 

We write W = Wj + Wh where Wi and Wh are the submatrices of W 
with columns in / and H respectively. Similarly, for z G Spread j, we write 

Z = Zj + Zh- 

Since \\Wf < \\Wif + Wnf, we have 

P(||l^z||2<a, IIW^II >&) =Pi+PH, 

where 

pi = F{\\Wz\\2 < a, \\Wh\\ > b/V2) 

= F(\\Wz\\2 < a I \\Wh\\ > b/V2) F(\\Wh\\ > b/V2), 



and similarly for pn- It suffices to bound pi; the argument for pn is similar. 

Writing Wz = WiZj + WhZr and using the independence of the matrices 
Wi and Wh, we conclude that 



Pi < sup F{\\WiZi - w\\2 < a) ^(111^^11 > b/V2) 
(7.7) < s\xp ¥[\\Wzi-w\\2<a)¥{\\W\\>b/V2). 



(In the last line we used WjZi = Wzj and \\Wh\\ < ||W^| 
By the assumption on z and since |/| > d/2, we have 



I II f\^\ lA'^'-^ ^ 

\zih=[^\z,\) 

fee/ ^ 
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Hence for x := 2;//||2;/||2 and u := w/||2/||2, we obtain 

¥{\\Wzi - w\\2 <a) < F{\\Wx - u\\2 < V2a/(3). 
Together with (17.71) . this completes the proof. □ 
We use this decouphng in the following refinement of Lemma 17.41 

Lemma 7.6. Let W he a random matrix as in (17. 3p . where P is the orthogonal 
projection ofM.^ onto the random subspace (Hjc)-^ , defined as in Theorem \7.1\ 
Then for every s > 1 and every t that satisfies (17. 2p . we have 

(7.8) Pf inf \\Wz\\2 < t^ and sK^Vd < \\W\\ < 2sKoVd) 

\ zSSpreadj / 

< {Cste-''''^ + e""". 

Proof. Let e = t/2sKQ. By Proposition 12. there exists an e-net TV of 
Spread J C S'(]R'^) of cardinality 

|Ar|<2rf(l + ^) <2rf(^) . 

Consider the event 

£ := \ inf 111^2112 < 2tVd and > sKoVd]. 

We condition on the realization of the subspace Hjc as above to make the 
columns of W independent. By the definition (16.11) of Spread j, any z E Af 
satisfies the condition of the Decoupling Proposition 17.51 with (3 = Ki. Taking 
the union bound and then using Proposition 17.51 we obtain 

I Hj.) < \Af\ ■ mRxF{\\Wz\\2 < 2tVd and \\W\\ > sKoVd \ Hjc) 



f2 

<|7V|-2 max pfllH^^-wlU < — ■2tv^ I if 

Assume now that LCDaAHjc) > cVNc^^/"^, where a and c are as in Theo- 
rem 14. 3[ Then using Proposition 17.31 and representation (17. 4p , we conclude as 
in the proof of Theorem 14.11 that 

F{S I Hj.) < Adl^^Y'' ■ {C't)'^-' ■ e-^'^'"^ 
for any t satisfying (17. 2p . Since s > 1 and d > 1, we can bound this as 

Hjc) < (Cste-'^'^y. 
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Therefore, by Theorem 14.31 

P(^) < ¥{£ I LCD„,e(^jc) > cTiVe"^/'") + P(LCD„,e(^jc) < cViVe"^/'") 

Now, suppose the event in (17.81) holds, i.e. there exists z' G Spreadj such that 

\\Wz'\\2 < t\Q and sKqvQ < \\W\\ < 2sKoVd. 
Choose z E M such that — 2;'||2 < e. Then by the triangle inequality 
\\Wz\\2 < \\Wz'\\2 + \\W\\ ■ \\z - z'\\2 <tVd + 2sKo^ ■ e < 2tVd. 
Therefore, £ holds. The bound on the probability oiE completes the proof. □ 



Proof of the Uniform Distance Theorem \7.1\ Recall that, without loss of gen- 
erality, we assumed that (17.21) held. Let ki be the smallest natural number 
such that 

(7.9) 2^=1 ■ KoVd > Co^iV, 

where Cq and Kq are constants from Lemma [2.31 and Lemma [7.61 respectively. 
Summing the probability estimates of Proposition 17.41 and Lemma 17.61 for s = 
2^, k = 1, . . . , ki, we conclude that 



Pf inf ||1^^||2 < tv^) 

V zGSpreadj / 

< {C2ty + J2 ((Cate-^^^')"^ + e-^^) + F{\\W\\ > CoVn) 

3 = 2'', fc = l,...,fcl 

< (^4^)'' + kie-'"" + ¥{\\A\\ > Cov^). 

By (17. 9p and Proposition 12. 3[ the last expression does not exceed {CtY + e~'^'^ . 
In view of representation (17. 4p . this completes the proof. □ 

8. Completion of the proof 

In Section [6|, we reduced the invertibility problem for incompressible vectors 
to computing the distance between a random ellipsoid and a random subspace. 
This distance was estimated in Section [3 These together lead to the following 
invertibility bound: 

Theorem 8.1 (Invertibility for incompressible vectors). Let S,p E (0,1)- 
There exist C,c > which depend only on 6, p, and such that the following 
holds. For every t > 0, 

pf inf \\Ax\\2 < t-^) < {CtY + e-^^. 
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Proof. Without loss of generality, we can assume that (17. 2p holds. We use 
Lemma [6.21 with e = ty/d and then Theorem 17. II to get the bound {C'tY on 
the desired probability. This completes the proof. □ 

Proof of Theorem This follows directly from (15.21) . (15.31) . and Theorem 18. 1[ 

□ 
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